Reweighted wake sleep deep generative model example #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pending merge of probcomp/Gen.jl#417 into Gen.jl
(Currently the rws_mnist/ project depends on the branch for probcomp/Gen.jl#417, which adds support for multi-threaded gradient estimation and removes some unnecessary parameter allocations, but before this PR is merged the branch of Gen used in rws_mnist/ should be changed to
master
)Some conclusions:
It is possible to use Gen to successfully train the 10-200-200 generative model (with stochastic hidden layers) and associated inference network in https://arxiv.org/abs/1406.2751 on the binarized MNIST data, roughly on the order of a day or two, without using a GPU and without vectorizing the model, using multi-threaded gradient estimation. The Gen implementation is considerably higher-level and easier to follow than this implementation in Theano.
In some preliminary experiments, multi-threaded gradient estimation currently gives some significant speedup (>4x) for minibatches of size 16-32 on a c4.8xlarge EC2 instance. But more thorough benchmarking, including on bare metal instances, would be helpful.
Some profiling of this benchmark and optimization of Gen for it, for large multi-core cloud instances, would be helpful, since the performance is likely to carry over to other relevant use cases, such as learning generative models and inference networks (perhaps with comparable or somewhat smaller neural networks) where the trace has stochastic structure. (The use case for non-vectorized CPU-based gradient estimation is more compelling in the case of highly stochastic structure -- e.g. for Bayesian program synthesis -- in which vectorization is more difficult and throughput advantage of GPU is reduced).